NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China

https://doi.org/10.1145/3442381.3450076

Rambert, Raymond; Weinberg, Zachary; Barradas, Diogo; Christin, Nicolas (April 2021, Proceedings of the Web Conference 2021)
null (Ed.)
Full Text Available
Where are you taking me?Understanding Abusive Traffic Distribution Systems

https://doi.org/10.1145/3442381.3450071

Szurdi, Janos; Luo, Meng; Kondracki, Brian; Nikiforakis, Nick; Christin, Nicolas (April 2021, Proceedings of the 30th Web Conference (WWW))
null (Ed.)
Illicit website owners frequently rely on traffic distribution systems (TDSs) operated by less-than-scrupulous advertising networks to acquire user traffic. While researchers have described a number of case studies on various TDSs or the businesses they serve, we still lack an understanding of how users are differentiated in these ecosystems, how different illicit activities frequently leverage the same advertisement networks and, subsequently, the same malicious advertisers. We design ODIN (Observatory of Dynamic Illicit ad Networks), the first system to study cloaking, user differentiation and business integration at the same time in four different types of traffic sources: typosquatting, copyright-infringing movie streaming, ad-based URL shortening, and illicit online pharmacy websites. ODIN performed 874,494 scrapes over two months (June 19, 2019–August 24, 2019), posing as six different types of users (e.g., mobile, desktop, and crawler) and accumulating over 2TB of data. We observed 81% more malicious pages compared to using only the best performing crawl profile by itself. Three of the traffic sources we study redirect users to the same traffic broker domain names up to 44% of the time and all of them often expose users to the same malicious advertisers. Our experiments show that novel cloaking techniques could decrease by half the number of malicious pages observed. Worryingly, popular blacklists do not just suffer from the lack of coverage and delayed detection, but miss the vast majority of malicious pages targeting mobile users. We use these findings to design a classifier, which can make precise predictions about the likelihood of a user being redirected to a malicious advertiser.
more » « less
Full Text Available
Self-Supervised Euphemism Detection and Identification for Content Moderation

https://doi.org/10.1109/SP40001.2021.00075

Zhu, Wanzheng; Gong, Hongyu; Bansal, Rohan; Weinberg, Zachary; Christin, Nicolas; Fanti, Giulia; Bhat, Suma (May 2021, Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP))
null (Ed.)
Full Text Available
Self-Supervised Euphemism Detection and Identification for Content Moderation

Zhu, Wanzheng Zhu; Gong, Hongyu; Bansal, Rohan; Weinberg, Zachary.; Christin, Nicolas; Fanti, Giulia; Bhat, Suma (January 2021, 2021 IEEE Symposium on Security and Privacy (SP))
null (Ed.)
Fringe groups and organizations have a long history of using euphemisms---ordinary-sounding words with a secret meaning---to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a ``ban list'', but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider ``pot'' (storage container or marijuana?) or ``heater'' (household appliance or firearm?). The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind. This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30--400\% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.
more » « less
Full Text Available
ICLab: A Global, Longitudinal Internet Censorship Measurement Platform

https://doi.org/10.1109/SP40000.2020.00014

Niaki, Arian Akhavan; Cho, Shinyoung; Weinberg, Zachary; Hoang, Nguyen Phong; Razaghpanah, Abbas; Christin, Nicolas; Gill, Phillipa (May 2020, IEEE Symposium on Security and Privacy)

Full Text Available
An Empirical Analysis of Traceability in the Monero Blockchain

https://doi.org/10.1515/popets-2018-0025

Möser, Malte; Soska, Kyle; Heilman, Ethan; Lee, Kevin; Heffan, Henry; Srivastava, Shashvat; Hogan, Kyle; Hennessey, Jason; Miller, Andrew; Narayanan, Arvind; et al (June 2018, Proceedings on Privacy Enhancing Technologies)

Abstract Monero is a privacy-centric cryptocurrency that allows users to obscure their transactions by including chaff coins, called “mixins,” along with the actual coins they spend. In this paper, we empirically evaluate two weaknesses in Monero’s mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable to “chain-reaction” analysis - that is, the real input can be deduced by elimination. Second, Monero mixins are sampled in such a way that they can be easily distinguished from the real coins by their age distribution; in short, the real input is usually the “newest” input. We estimate that this heuristic can be used to guess the real input with 80% accuracy over all transactions with 1 or more mixins. Next, we turn to the Monero ecosystem and study the importance of mining pools and the former anonymous marketplace AlphaBay on the transaction volume. We find that after removing mining pool activity, there remains a large amount of potentially privacy-sensitive transactions that are affected by these weaknesses. We propose and evaluate two countermeasures that can improve the privacy of future transactions.
more » « less
Full Text Available

Search for: All records